智能论文笔记

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

Niko Partanen , Jack Rueter , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2021-12-28

该研究形成了由芬兰民族学家和语言学家，Matthias Alexander Castr \'en（1813-1852）收集和出版的材料进行的各种任务的技术报告。 Finno-Ugrian社会正在将Castr \'en的稿件作为新的关键和数字版本出版，同时不同的研究团体也关注这些材料。我们讨论了所用的工作流程和技术基础设施，并考虑如何创建有利于不同计算任务的数据集以进一步提高这些材料的可用性，并帮助进一步处理类似的归档集合。我们专注于以一种方式处理的集合的部分，这些集合可以在更提高其在更多技术应用中的可用性，补充较早的这些材料的文化和语言方面的工作。大多数这些数据集在Zenodo公开使用。该研究指出需要进一步研究的特定区域，并为文本识别任务提供基准。

translated by 谷歌翻译

Detecting Depression in Thai Blog Posts: a Dataset and a Baseline

Mika Hämäläinen , Pattama Patpong , Khalid Alnajjar , Niko Partanen , Jack Rueter

分类：自然语言处理

2021-11-08

我们介绍了泰国抑郁症的第一个公开的有用的语料库。我们的语料库由几个在线博客中的抑郁症的专家验证案例编制。我们试验两种不同的基于LSTM的模型和两种不同的基于伯特模型。我们在检测抑郁症时达到77.53 \％的准确性。这为同一语料库的未来研究人员建立了一个很好的基准。此外，我们确定需要在比维基百科更多种多样的语料库培训的泰国嵌入。我们的语料库，代码和培训的型号在Zenodo上公开发布。

translated by 谷歌翻译

Finnish Dialect Identification: The Effect of Audio and Text

Mika Hämäläinen , Khalid Alnajjar , Niko Partanen , Jack Rueter

分类：自然语言处理

2021-11-06

芬兰语是一种具有多种方言的语言，不仅在口音（发音）方面彼此不同，而且在形态形式和词汇选择方面也不同。我们介绍了基于方言转录器和转录器自动检测扬声器方言的方法，以及由23个不同方言组成的数据集中的音频录制。我们的结果表明，通过组合两个模式来接收最佳精度，因为文本只达到57 \％的整体准确性，其中文本和音频达到85 \％。我们的代码，模型和数据在Github和Zenodo上公开发布。

translated by 谷歌翻译

Enhanced Bayesian Neural Networks for Macroeconomics and Finance

Niko Hauzenberger , Florian Huber , Karin Klieber , Massimiliano Marcellino

分类： (统计)机器学习

2022-11-09

We develop Bayesian neural networks (BNNs) that permit to model generic nonlinearities and time variation for (possibly large sets of) macroeconomic and financial variables. From a methodological point of view, we allow for a general specification of networks that can be applied to either dense or sparse datasets, and combines various activation functions, a possibly very large number of neurons, and stochastic volatility (SV) for the error term. From a computational point of view, we develop fast and efficient estimation algorithms for the general BNNs we introduce. From an empirical point of view, we show both with simulated data and with a set of common macro and financial applications that our BNNs can be of practical use, particularly so for observations in the tails of the cross-sectional or time series distributions of the target variables.

translated by 谷歌翻译

ParticleNeRF: Particle Based Encoding for Online Neural Radiance Fields in Dynamic Scenes

Jad Abou-Chakra , Feras Dayoub , Niko Sünderhauf

分类：计算机视觉 | 机器人

2022-11-08

Neural Radiance Fields (NeRFs) are coordinate-based implicit representations of 3D scenes that use a differentiable rendering procedure to learn a representation of an environment from images. This paper extends NeRFs to handle dynamic scenes in an online fashion. We do so by introducing a particle-based parametric encoding, which allows the intermediate NeRF features -- now coupled to particles in space -- to be moved with the dynamic geometry. We backpropagate the NeRF's photometric reconstruction loss into the position of the particles in addition to the features they are associated with. The position gradients are interpreted as particle velocities and integrated into positions using a position-based dynamics (PBS) physics system. Introducing PBS into the NeRF formulation allows us to add collision constraints to the particle motion and creates future opportunities to add other movement priors into the system such as rigid and deformable body constraints. We show that by allowing the features to move in space, we incrementally adapt the NeRF to the changing scene.

translated by 谷歌翻译

Residual Skill Policies: Learning an Adaptable Skill-based Action Space for Reinforcement Learning for Robotics

Krishan Rana , Ming Xu , Brendan Tidd , Michael Milford , Niko Sünderhauf

分类：机器人 | 人工智能 | 机器学习

2022-11-04

Skill-based reinforcement learning (RL) has emerged as a promising strategy to leverage prior knowledge for accelerated robot learning. Skills are typically extracted from expert demonstrations and are embedded into a latent space from which they can be sampled as actions by a high-level RL agent. However, this skill space is expansive, and not all skills are relevant for a given robot state, making exploration difficult. Furthermore, the downstream RL agent is limited to learning structurally similar tasks to those used to construct the skill space. We firstly propose accelerating exploration in the skill space using state-conditioned generative models to directly bias the high-level agent towards only sampling skills relevant to a given state based on prior experience. Next, we propose a low-level residual policy for fine-grained skill adaptation enabling downstream RL agents to adapt to unseen task variations. Finally, we validate our approach across four challenging manipulation tasks that differ from those used to build the skill space, demonstrating our ability to learn across task variations while significantly accelerating exploration, outperforming prior works. Code and videos are available on our project website: https://krishanrana.github.io/reskill.

translated by 谷歌翻译

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Pingchuan Ma , Niko Moritz , Stavros Petridis , Christian Fuegen , Maja Pantic

分类：计算机视觉

2022-11-03

Recognizing a word shortly after it is spoken is an important requirement for automatic speech recognition (ASR) systems in real-world scenarios. As a result, a large body of work on streaming audio-only ASR models has been presented in the literature. However, streaming audio-visual automatic speech recognition (AV-ASR) has received little attention in earlier works. In this work, we propose a streaming AV-ASR system based on a hybrid connectionist temporal classification (CTC)/attention neural network architecture. The audio and the visual encoder neural networks are both based on the conformer architecture, which is made streamable using chunk-wise self-attention (CSA) and causal convolution. Streaming recognition with a decoder neural network is realized by using the triggered attention technique, which performs time-synchronous decoding with joint CTC/attention scoring. For frame-level ASR criteria, such as CTC, a synchronized response from the audio and visual encoders is critical for a joint AV decision making process. In this work, we propose a novel alignment regularization technique that promotes synchronization of the audio and visual encoder, which in turn results in better word error rates (WERs) at all SNR levels for streaming and offline AV-ASR models. The proposed AV-ASR model achieves WERs of 2.0% and 2.6% on the Lip Reading Sentences 3 (LRS3) dataset in an offline and online setup, respectively, which both present state-of-the-art results when no external training data are used.

translated by 谷歌翻译

Retrospectives on the Embodied AI Workshop

Matt Deitke , Dhruv Batra , Yonatan Bisk , Tommaso Campari , Angel X. Chang , Devendra Singh Chaplot , Changan Chen , Claudia Pérez D'Arpino , Kiana Ehsani , Ali Farhadi

分类：计算机视觉

2022-10-13

We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement, and (3) embodied vision-and-language. We discuss the dominant datasets within each theme, evaluation metrics for the challenges, and the performance of state-of-the-art models. We highlight commonalities between top approaches to the challenges and identify potential future directions for Embodied AI research.

translated by 谷歌翻译

Density-aware NeRF Ensembles: Quantifying Predictive Uncertainty in Neural Radiance Fields

Niko Sünderhauf , Jad Abou-Chakra , Dimity Miller

分类：计算机视觉

2022-09-19

我们表明，如果考虑密度感知的认知不确定性项，则有效地量化神经辐射场（NERF）中的模型不确定性。在先前的工作中调查的幼稚合奏简单地渲染了RGB图像，以量化因观察到的场景的解释而引起的模型不确定性。相比之下，我们还考虑了各个射线沿线的终止概率，以确定认知模型的不确定性，因为对训练过程中未观察到的场景部分的知识不足。我们在NERF的既定不确定性量化基准中实现了新的最先进的性能，优于需要对NERF架构和培训制度进行复杂更改的方法。我们此外表明，可以将NERF不确定性用于次要视图选择和模型改进。

translated by 谷歌翻译

Noisy Inliers Make Great Outliers: Out-of-Distribution Object Detection with Noisy Synthetic Outliers

Samuel Wilson , Tobias Fischer , Feras Dayoub , Niko Sünderhauf

分类：计算机视觉

2022-08-29

许多高性能作品在分布外（OOD）检测方面使用真实或合成生成的异常数据来正式化模型置信度；但是，它们通常需要重新培训基本网络或专门的模型体系结构。我们的作品表明，嘈杂的嵌入式在OOD对象检测的挑战领域中使异常值（Nimgo）成为了很大的异常值。我们假设合成异常值只需要最小化分布（ID）数据的扰动变体即可训练一个歧视器以识别OOD样本 - 而无需昂贵的基本网络重新培训。为了检验我们的假设，我们通过在图像或边界盒级别上应用添加剂噪声扰动来生成一个合成的离群值。然后，对辅助功能监视多层感知器（MLP）进行训练，以使用扰动的ID样品作为代理来检测OOD特征表示。在测试过程中，我们证明辅助MLP将ID样品与最新水平的OOD样品区分开在OpenImages数据集中。广泛的额外消融提供了支持我们假设的经验证据。

translated by 谷歌翻译